library(tidyverse)
library(readxl)
library(scales)
library(ggmap)
library(maps)
library(usmap)
library(plotly)

EV_Chargers <- read_excel('data/EV Chargers_Last updated 07-30-2021.xlsx', sheet = 2)

EV_Sales <- read_excel('data/New ZEV Sales_Last updated 07-30-2021.xlsx', sheet = 2)

EV_Sales_Zip <- read_excel('data/New ZEV Sales_Last updated 07-30-2021.xlsx', sheet = 3)

Analyze the trends of electric car sales before 2021

#Filter out Hydrogen Cars. Not include 2021 since 2021 has not ended
EV_Sales <- EV_Sales %>%
  filter(FUEL_TYPE != "Hydrogen") %>%
  filter(`Data Year` != 2021)

#Total Car Sales Per Year.
EV_Sales_Year <- EV_Sales %>%
  group_by(`Data Year`) %>%
  summarize(total_sales = sum(`Number of Vehicles`)) %>%
  arrange(desc(`Data Year`))

EV_Sales_Year %>% ggplot(aes(x = `Data Year`, y = total_sales)) + 
  geom_line(colour = "blue") +
  labs(x = "Year",
       y = "Total Sales of Cars",
       title = "Total Sales of Cars Per Year")

Electric cars sales are growing exponentially in California since 2010. Between 2017-2018, there is a huge increase in electric car sales. We will look at why there is such big gap between 2017 and 2018.

#Filter by Brand
EV_Make <- EV_Sales %>%
  group_by(MAKE) %>%
  summarize(total = sum(`Number of Vehicles`)) %>%
  arrange(desc(total))

#Only choose sales from top brands
EV_Make = head(EV_Make, 10)

ggplot(data = EV_Make, aes(x = reorder(MAKE, -total), y = total, fill = MAKE)) + 
  geom_col() +
  labs(x = "Model",
        y = "Total Sales of Car",
        title = "Total Numbers of Electric Cars Sold") + scale_y_continuous(labels = comma) +
  theme(axis.text.x = element_text(angle = 45), legend.position = "none")

Overall, Tesla outperform all other car companies in electric car sales.Tesla’s sales is almost triple than the second place, Chevrolet.

head(EV_Sales_Year, 5)
## # A tibble: 5 × 2
##   `Data Year` total_sales
##         <dbl>       <dbl>
## 1        2020      144099
## 2        2019      145263
## 3        2018      154747
## 4        2017       91464
## 5        2016       71723
EV_Make_Year_Telsa_Model_3 <- EV_Sales %>%
  group_by(MAKE, `Data Year`, MODEL) %>%
  summarize(total = sum(`Number of Vehicles`)) %>%
  arrange(desc(total)) %>%
  filter(MAKE == 'TESLA' & `Data Year` %in% c(2017, 2018) & MODEL == "Model 3")

EV_Make_Year_Telsa_Model_3
## # A tibble: 2 × 4
## # Groups:   MAKE, Data Year [2]
##   MAKE  `Data Year` MODEL   total
##   <chr>       <dbl> <chr>   <dbl>
## 1 TESLA        2018 Model 3 49710
## 2 TESLA        2017 Model 3   840

There is a big increase of car sales from 2017 to 2018. Model 3 caused the increase (almost 50K) from 2017 to 2018. Model 3 is a game changer for Tesla or even the entire market. Since 2018, Tesla sold the most electric car overall.

#Analyze all types of Tesla cars and put it into pie chart
Tesla_Models <- EV_Sales %>%
  filter(MAKE == "TESLA") %>%
  group_by(MODEL) %>%
  summarize(total = sum(`Number of Vehicles`)) %>%
  arrange(desc(total)) %>%
  mutate(total_model = sum(total)) %>%
  mutate(perc = round(total/total_model, 2))

plot_ly(data = Tesla_Models, labels = ~MODEL, values = ~perc, type = "pie", sort = FALSE)

Overall, Model 3 contributes more than 50% of the Tesla sales. Roadster contributed to the 0% of the Tesla market due to supply chain shortages, so it is not available for the market to purchase. Model Y only contributes to 7% of the market because Model Y is the new model that was released in 2019; however the CEO of Tesla expected Model Y to be popular as Model X soon.

Analyze where all car companies stand before the release of Tesla Model 3

#Filter data from 2012 and 2017
EV_Make_Year_before_2017 <- EV_Sales %>%
  group_by(MAKE, `Data Year`) %>%
  summarize(total = sum(`Number of Vehicles`)) %>%
  arrange(desc(total)) %>%
  filter(`Data Year` <=  2017 & `Data Year` >= 2012)

EV_Cars_Sales_Before_2017 <- EV_Make_Year_before_2017 %>%
  group_by(`MAKE`) %>%
  summarize(total_sales = sum(total)) %>%
  arrange(desc(total_sales))

EV_Cars_Sales_Before_2017 <- head(EV_Cars_Sales_Before_2017, 10)

ggplot(data = EV_Cars_Sales_Before_2017, aes(x = reorder(MAKE, -total_sales), y = total_sales, fill = MAKE)) + 
  geom_col() +
  labs(x = "Model",
        y = "Total Sales of Car",
        title = "Total Numbers of Electric Cars Sold before 2018") + scale_y_continuous(labels = comma) +
  theme(axis.text.x = element_text(angle = 45), legend.position = "none")

Compared to the previous bar graph, Tesla is the second place between 2012 and 2017. Chevrolet seems the first place between 2012 and 2017. Tesla Model 3 is a game changer to Tesla which brought Tesal to become the number 1 company that sold the most electric cars.

We choose top 6 brands to compare total sales from each year: Tesla, Toyota, Chevrolet, BMW, NISSAN

EV_Sales_Year_Model <- EV_Sales %>%
  group_by(`Data Year`, MAKE) %>%
  summarize(total_sales = sum(`Number of Vehicles`)) %>%
  filter(MAKE %in% (c("TESLA", "Toyota", "CHEVROLET", "BMW", "NISSAN", "FORD")))  %>%
  filter(`Data Year` >= 2015)

ggplot(data = EV_Sales_Year_Model, aes(x = MAKE, y = total_sales, fill = MAKE)) + 
  geom_col() + facet_wrap(~`Data Year`, nrow=1) +  
  theme(axis.text.x = element_text(angle = 90), legend.position = "none") +
  labs(x = "Model", y = "Total Sales of Cars")

Except Tesla, sales for each brand is not increasing but decreasing, but only Tesla increases, which means that Tesla really dominated the market from Model 3. Additionally, Tesla is the company that made full electric cars, while other five companies built hybrid cars as well. That only means people are purchasing more full electric cars.

Find numbers of EV charging station at each county, and how it could influence electric car sales

#Total EV Chargers in each county
EV_Charger <- EV_Chargers

EV_Charger <- EV_Chargers %>%
  select(c(County, Total)) %>%
  arrange(desc(Total)) %>%
  filter(County != 'Total')

head(EV_Charger, 10)
## # A tibble: 10 × 2
##    County        Total
##    <chr>         <dbl>
##  1 Los Angeles   20193
##  2 Santa Clara   16202
##  3 San Diego      7199
##  4 Orange         5477
##  5 San Mateo      4459
##  6 Alameda        3560
##  7 Sacramento     1852
##  8 San Francisco  1671
##  9 Contra Costa   1461
## 10 Riverside      1441

Analysis: Los Angeles, San Clara, and San Diego are the top 3. Only 4 counties have more than 5000 EV charging stations. LA has the most EV charging stations.

Draw a map visualization to demonstrate which county has the most EV charging station

states <- map_data("state")

ca_df <- subset(states, region == "california")
counties <- map_data("county")
ca_county <- subset(counties, region == "california")
ca_base <- ggplot(data = ca_df, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
  geom_polygon(color = "black", fill = "gray")

EV_Charger$County <- tolower(EV_Charger$County)

EV_Chargers_County <- EV_Charger %>%
  inner_join(ca_county, by = c("County" = "subregion"))

ditch_the_axes <- theme(
  axis.text = element_blank(),
  axis.line = element_blank(),
  axis.ticks = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.title = element_blank()
  )

Map_EV_Charging_Station <- ca_base + 
      geom_polygon(data = EV_Chargers_County, aes(fill = Total), color = "black") +
      geom_polygon(color = "black", fill = NA) +
      theme_bw() +
   scale_fill_gradient(low = "light blue", high = "red") +
  ditch_the_axes 

Map_EV_Charging_Station 

Map Car Sales

#Total Car Sales Per County
EV_Sales_County <- EV_Sales %>%
  group_by(County) %>%
  summarize(total_sales = sum(`Number of Vehicles`)) %>%
  arrange(desc(total_sales))

EV_Sales_County_1 <- EV_Sales_County
EV_Sales_County_1$County <- tolower(EV_Sales_County_1$County)


EV_Sales_County_1 <- EV_Sales_County_1 %>%
  inner_join(ca_county, by = c("County" = "subregion"))

Map_EV_Sales_County <- ca_base + 
      geom_polygon(data = EV_Sales_County_1, aes(fill = total_sales), color = "black") +
      geom_polygon(color = "black", fill = NA) +
      theme_bw() +
   scale_fill_gradient(low = "light blue", high = "red") +
  ditch_the_axes 

Map_EV_Sales_County

Step: Analyze the relationship between EV charging station and number of electric cars sold

EV_charger_car_sales <- EV_Chargers %>%
  inner_join(EV_Sales_County, by = "County")

#The relationship between EV_Sales_County and EV_Chargers. Compare the proportion of charging_station vs prop_total.using log
ggplot(EV_charger_car_sales, aes(x = log(Total), y = log(total_sales))) + 
  geom_point() + labs(x = "Number of Charging Stations", y = "Total Sales of Cars",
                                     title = "Numbers of Charging Stations vs Total Sales of Cars")

Analysis: There is a positive correlation between EV charging station and number of electric cars sold. We also recommend car companies to target counties that have high EV charging stations and high total car sales, such as Los Angeles, Santa Clara, Orange, etc. If there are more EV charging stations, car owners can easily find one to charge their cars.

Prediction: how many cars can be sold in the next five year. Provide a recommendation to car companies about where to target potential customers

EV_Sales_Year_desc <- EV_Sales_Year %>%
  arrange(`Data Year`) %>%
  filter(`Data Year` >= 2011)

lm.year <- lm(`total_sales` ~ `Data Year`, data = EV_Sales_Year_desc)

future_sales <- data.frame(predict(lm.year))

#next 10 years
Year <- seq(2021,2029)

future_sales <- data.frame(future_sales)

#added total_sales from 2020
future_sales <- data.frame(Year, head(future_sales,9) + 144099)

#change the column names
colnames(future_sales) <- c("Data.Year", "total_sales")

EV_Sales_Year <- data.frame(EV_Sales_Year)

#combine the past and the predicted data
EV_Sales_Future_Current <- rbind(EV_Sales_Year, future_sales) %>%
  arrange(desc(`Data.Year`))

head(future_sales, 6)
##   Data.Year total_sales
## 1      2021    146912.9
## 2      2022    163938.9
## 3      2023    180964.8
## 4      2024    197990.7
## 5      2025    215016.6
## 6      2026    232042.6
EV_Sales_Future_Current <- EV_Sales_Future_Current %>%
  filter(`Data.Year` <= 2025)

EV_Sales_Future_Current %>% ggplot(aes(x = `Data.Year`, y = total_sales)) + 
  geom_line(colour = "blue") +
  labs(x = "Year",
       y = "Total Sales of Cars",
       title = "Total Sales of Cars Per Year")

We used linear regression model to predict electric car sales in the next five years. Our independent variable is year, and dependent variable is sales. However, there can be some policies that can affect the popularity of electric cars, or the shortages of materials to build the cars. Thus, we do not want to predict for the next 10 years or so.

Analyze car sales by zip code

#Filter by Car Zip. Find out why 
EV_Sales_Zip_Top_20 <- EV_Sales_Zip %>%
  group_by(ZIP) %>%
  summarize(total = sum(`Number of Vehicles`)) %>%
  arrange(desc(total)) %>%
  head(20)

EV_Sales_Zip_Top_20
## # A tibble: 20 × 2
##      ZIP total
##    <dbl> <dbl>
##  1 94539  7200
##  2 95014  6326
##  3 94568  5402
##  4 92618  5365
##  5 95120  5236
##  6 95035  4892
##  7 95070  4810
##  8 94087  4571
##  9 92620  4554
## 10 94536  4441
## 11 92130  4356
## 12 94582  4140
## 13 90266  4102
## 14 94025  3995
## 15 90049  3969
## 16 95124  3968
## 17 95125  3923
## 18 94555  3609
## 19 94538  3582
## 20 95051  3546

8 zips are from Santa Clara County, and 2 from Irvine (Orange County). We believe there is one or two zip codes from LA because LA is more diverse county. There are a few ways that car companies can target their customers by using our analysis. They can target their customers from zip codes that have high electric car sales, or target zip codes that have less electric cars but living in affluent communities, which requires more analysis.

To further expand this analysis, we can analyze what kinds of people purchase electric cars since not everyone can afford to purchase electric cars. Analyzing car sales by counties might be too generic because each city is different from other cities. Therefore, we can take consideration of people’s income in each city for car companies in order to target their consumers.